Multiplication of multi-precision numbers having a size of a power of two

ABSTRACT

Multi-precision multiplication methods include storing a first operand and a second operand as a first array and a second array of n words. A first weighted sum is determined from multiple subproducts of corresponding words of the first operand and the second operand. The methods may further include iteratively determining a next weighted sum from a previous weighted sum and a recursively calculated intermediate product. The disclosed methods can be used in a variety of different applications (e.g., cryptography) and can be implemented in a number of software or hardware environments.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional PatentApplication No. 60/401,589, filed Aug. 6, 2002, and U.S. ProvisionalPatent Application No. 60/419,204, filed Oct. 16, 2002, both of whichare incorporated herein by reference.

FIELD OF INVENTION

[0002] This application relates to the multiplication of multi-precisionnumbers in a variety of different applications, including cryptography.

BACKGROUND

[0003] Performing mathematical operations on large numbers can be atime-consuming and resource-intensive process. One method of handlinglarge numbers involves dividing the numbers into smaller divisions, orwords, having a fixed length. Numbers divided in this manner are termed“multi-precision” numbers. In the field of digital circuits, forinstance, the binary representation of a large number can be stored inmultiple words, wherein each word has a fixed length of n bits dependingon the word size supported by the associated hardware or software.Although adding and subtracting multi-precision numbers can be performedrelatively efficiently, multi-precision multiplication is much morecomplex and creates a significant bottleneck in applications usingmulti-precision arithmetic.

[0004] One application that requires multi-precision arithmetic iscryptography. Many public-key algorithms, including the Diffie-Hellmankey exchange algorithm, elliptic curve cryptography, and the EllipticCurve Digital Signature Algorithm (ECDSA), involve the multi-precisionmultiplication of large numbers. For example, elliptic curve systemsperform multi-precision arithmetic on 128- to 256-bit numbers, whilesystems based on exponentiation employ 1024- to 2048-bit numbers.

[0005] In order to improve the performance of these and othercryptographic systems, it is desirable to improve the efficiency of themulti-precision multiplication algorithm. Any improvements, even ifrelatively small, can result in a significant increase in the overallperformance of the application, because the multiplication algorithm canbe called many times during normal operation.

SUMMARY

[0006] Methods and apparatus are disclosed for multiplyingmulti-precision numbers. In certain embodiments, fewer recursions areperformed than in the known methods. As a result, the methods can beused to increase performance in a variety of applications that utilizemulti-precision arithmetic. One particular application in which thisimproved performance is desirable is cryptography.

[0007] In one disclosed embodiment, a method of multiplying two operandsis provided. In the method, the first operand is stored as a first arrayof n words. Similarly, the second operand is stored as a second array ofn words. In the method, n is an integer whose value is a power of two.The first and second array may be padded with zeros so that they eachhave n words. The method of this embodiment further includes determininga first weighted sum from multiple subproducts of corresponding words ofthe first operand and the second operand. The corresponding words of thefirst operand and the second operand may be associated with a selectedpower of the radix. The first weighted sum may also include adding aword-shifted version of at least one of the subproducts. The subproductsin the first weighted sum may correspond to branches from acorresponding recursion tree. Specifically, the subproducts maycorrespond to low or high branches having no mid-branch ancestors. Themethod of this embodiment additionally includes iteratively determininga next weighted sum from a previous weighted sum and a recursivelycalculated intermediate product. In one of the iterations, the previousweighted sum may be equal to the first weighted sum. In certainimplementations, the next weighted sum includes a shifted version of theprevious weighted sum.

[0008] The disclosed methods may be used in a number of differentapplications that utilize multi-precision arithmetic. For example, themethod can be used to generate various cryptographic parameters. In oneparticular implementation, for instance, a private key and a base pointare multiplied using one of the disclosed methods to obtain a productthat is associated with a public key. In this implementation, theprivate key and the base point are multi-precision numbers having nwords, wherein n is an integer that is a power of two. The disclosedmethods may similarly be used in a signature generation or signatureverification process (e.g., the Elliptic Curve Digital SignatureAlgorithm (ECDSA)).

[0009] The disclosed methods may be implemented in a variety ofdifferent software and hardware environments. Any of the disclosedmethods may be implemented, for example, as a set of instructions storedon a computer-readable medium. The methods may also be implemented in avariety of integrated circuits, such as a field programmable gate array.

[0010] These and other features of the disclosed technology aredescribed below with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 is a block diagram of an exemplary recursion tree.

[0012]FIG. 2 is a block diagram illustrating the operation of theKaratsuba-Ofman algorithm using the recursion tree of FIG. 1.

[0013]FIG. 3 is a flowchart showing a general method of multiplyingmulti-precision operands.

[0014]FIG. 4 is a flowchart showing a general method of performingprocess block 312 from FIG. 3.

[0015]FIG. 5 is a flowchart showing a general method of performingprocess block 314 from FIG. 3.

[0016]FIG. 6 is a flowchart showing a particular implementation of amethod of multiplying multi-precision operands.

[0017]FIG. 7 is a first block diagram illustrating the operation of themethod of FIG. 6 using the recursion tree of FIG. 1.

[0018]FIG. 8 is a second block diagram illustrating the operation of themethod of FIG. 6 using the recursion tree of FIG. 1.

[0019]FIG. 9 is a third block diagram illustrating the operation of themethod of FIG. 6 using the recursion tree of FIG. 1.

[0020]FIG. 10 is a fourth block diagram illustrating the operation ofthe method of FIG. 6 using the recursion tree of FIG. 1.

[0021]FIG. 11 is a block diagram of a general-purpose computerconfigured to perform multi-precision multiplication according to thedisclosed methods.

[0022]FIG. 12 is a block diagram of a dedicated digital circuitconfigured to perform multi-precision multiplication according to thedisclosed methods.

[0023]FIG. 13 is a block diagram of a cryptographic system configured toperform multi-precision multiplication according to the disclosedmethods and to output a cryptographic parameter.

DETAILED DESCRIPTION

[0024] Disclosed below are representative embodiments that should not beconstrued as limiting in any way. Instead, the present disclosure isdirected toward novel and nonobvious features and aspects of the variousembodiments of the multi-precision multiplication methods and apparatusdescribed below. The disclosed features and aspects can be used alone orin novel and nonobvious combinations and sub-combinations with oneanother.

[0025] Although the operations of the disclosed methods are described ina particular, sequential order for the sake of presentation, it shouldbe understood that this manner of description encompasses minorrearrangements, unless a particular ordering is required. For example,operations described sequentially may in some cases be rearranged orperformed concurrently. Moreover, for the sake of simplicity, thedisclosed flowcharts typically do not show the various ways in whichparticular methods can be used in conjunction with other methods.Additionally, the detailed description sometimes uses terms like“determine” and “obtain” to describe the disclosed methods. These termsare high-level abstractions of the actual operations that are performedby a computer or digital circuit. The actual operations that correspondto these terms will vary depending on the particular implementation andare readily discernible by one of ordinary skill in the art.

[0026] As more fully described below, the disclosed methods can beimplemented in a variety of different environments, including ageneral-purpose computer, an application-specific computer, anintegrated circuit (e.g., a field programmable gate array), or invarious other environments known in the art. The particular environmentsdiscussed, however, should not be construed to limit the scope of use orfunctionality of the disclosed methods.

[0027] General Considerations

[0028] Hardware or software implementations typically support a fixedword size. In these implementations, large numbers are typicallyrepresented as multiple words, wherein each word comprises multiplebits. As noted above, multi-word numbers are called “multi-precision”numbers and can be manipulated on a word-by-word basis using“multi-precision arithmetic.”

[0029] For purposes of the present disclosure, unsigned multi-precisionnumbers are denoted as bold-face variables. Accordingly, let a be anumber stored in n words of w bits each. The words of a are denoted asa[0], a[1], . . . , a[n−1]. Further, let a₀, a₁, . . . , a_(nw−1) denotethe bits of a from the least significant to the most significant. Thus,a[i] contains the bits of a from the least significant to the mostsignificant. Thus, a[i] contains the bits a_(iw+j) for j=0, . . . , w−1and represents the 1-word number$\sum\limits_{j = 0}^{w - 1}\quad {a_{{iw} + j}{x^{j}.}}$

[0030] The word a can also be written in radix z=2^(w) as follows:$\begin{matrix}{a = {\sum\limits_{i = 0}^{n - 1}\quad {{a\lbrack i\rbrack}z^{i}}}} & (1)\end{matrix}$

[0031] Alternatively, the multi-precision number a can be viewed as anarray of words, from the 0th word to the (n−1)th word. Multiplesubarrays can also be defined from the words that constitute a givenmulti-precision number. In particular, let a^(l)[k] denote the subarraycontaining the words a[k+i] for i=0, . . . , l−1 and represent thefollowing l-word number: $\begin{matrix}{{{a^{l}\lbrack k\rbrack} = {\sum\limits_{i = 0}^{l - 1}\quad {{a\left\lbrack {k + i} \right\rbrack}z^{i}}}},} & (2)\end{matrix}$

[0032] wherein n and l are integers, k is an index to the first word ofthe subarray, and l is the length of the subarray in words.

[0033] Multi-word operations used in the methods described belowinclude: (1) adding and subtracting; (2) multiplying by powers ofz=x^(w); and (3) assigning values to subarrays. The addition andsubtraction of two n-word numbers produces another n-word number, plusan extra bit. This extra bit is a carry bit for addition and a sign (orborrow) bit for subtraction. Multi-precision addition and subtractionare relatively easy operations. Further, because z is 2^(w), multiplyinga number by z^(i) is equivalent to left-shifting the words of thecorresponding array by i positions. That is, the jth word becomes the(i+j)th word. Because of the shifting, the 0th through (i−1)th words areemptied and typically filled (or “padded”) with zeros.

[0034] The subarray of a number can be assigned a number. For example,let a be an n-word number and b be an l-word number. Consider thefollowing assignment:

a^(l)[k]:=b  (3)

[0035] This assignment overwrites the words of a. The words a[k+i] fori=0, . . . , l−1 are replaced with the words b[i] for i=0, . . . , l−1respectively.

[0036] The Karatsuba-Ofman Algorithm (KOA)

[0037] The classical multi-precision multiplication algorithmstraightforwardly multiplies every word of the first operand by everyword of the other operand and adds the partial products. This algorithmis sometimes referred to as the grade-school method and has an O(n²)complexity where n is the mutiplicand size.

[0038] The Karatsuba-Ofman Algorithm (“KOA”) is an alternativemulti-precision multiplication algorithm. The KOA has an O(n^(1.58))asymptotic complexity, and thus multiplies large numbers faster than theclassical method. KOA is a recursive algorithm that uses a so-called“divide-and-conquer” strategy. The following paragraphs describe thegeneral principles underlying the KOA.

[0039] Let a and b be two n-word numbers where n is even. The n-wordnumbers a and b can be split into two parts as follows:

a=a _(L) +a _(H) z ^(n/2) b=b _(L) +b _(H) z ^(n/2)  (4)

[0040] where a_(L)=a^(n/2)[0], b_(L)=b^(n/2)[0], a_(H)=a^(n/2)[n/2], andb_(H)=b^(n/2)[n/2]. In other words, a_(L) and b_(L) are the numbersrepresented by the low-order words (the first n/2 words), while a_(H)and b_(H) are the numbers represented by the high-order words (the lastn/2 words).

[0041] Next, let t=a * b. Then, t can be written in terms of thehalf-sized numbers a_(L), b_(L), a_(H), and b_(H) as follows:$\begin{matrix}{t = {{a \cdot b}\quad = {{\left( {a_{L} + a_{H^{z^{n/2}}}} \right)\left( {b_{L} + b_{H^{z^{n/2}}}} \right)}\quad \quad = {{a_{L}b_{L}} + {\left( {{a_{L}b_{H}} + {a_{H}b_{L}}} \right)z^{n/2}} + {a_{H}b_{H^{z^{n}}}}}}}} & (5)\end{matrix}$

[0042] As seen in Equation (5), the product t can be computed from fourhalf-sized products: a_(L)b_(L), a_(L)b_(H), a_(H)b_(L), and a_(H)b_(H).On the other hand, the equalitya_(L)b_(H)+a_(H)b_(L)=a_(L)b_(L)+a_(H)b_(H)+(a_(L)−a_(H))(b_(H)−b_(L))can used in Equation (5) to obtain:

t=a _(L) b _(L) +[a _(L) b _(L) +a _(H) b _(H)+(a _(L) −a _(H))(b _(H)−b _(L))] z ^(n/2) +a _(H) b _(H) z ^(n)  (6)

[0043] Equation (6) shows that three half-sized products are sufficientto compute t instead of four. These products are: a_(L)b_(L),a_(H)b_(H), and (a_(L)−a_(H))(b_(H)−b_(L)). Although the number ofproducts is reduced, the number of additions and subtractions isincreased. Because the complexity of adding and subtracting is linear(i.e., O(n)), however, the computation is simplified overall. Note thatthe multiplications by the powers of z=2^(w) in Equation (6) need nocomputation. These multiplications correspond to array shifts and can beachieved by the proper indexing of the arrays representing the numbersin (6).

[0044] As shown in Equation (6), the KOA computes a product from threehalf-sized products. In a recursive process, KOA computes each of thesehalf-sized products from three quarter-sized products. When the productsreach some designated size, (for example, when the multiplicands arereduced to one word), the recursion stops and the products are computedusing classical methods.

[0045] The following pseudocode describes an exemplary implementation ofthe KOA function. In the disclosed function, the inputs are assumed tobe multi-word numbers that can be split evenly into lower- andhigher-order words at each recursion level. Thus, the size n of eachinput must be a power of two. The function does not need to be limitedto this special case, however, as a general function can be derived forthe KOA that splits the inputs when their size n is not divisible bytwo. function: KOA(a, b : n-word number; n : integer) t : 2n-word numbera_(L), a_(M), a_(H) : (n/2)-word number low, mid, high : n-word numberbegin /* When the input size is one word */ Step 1: if n = 1 then returnt := a * b /* Generate 3 pairs of half sized numbers */ Step 2: a_(L) :=a^(n/2)[0] Step 3: b_(L) := b^(n/2)[0] Step 4: a_(H) := a^(n/2)[n/2]Step 5: b_(H) := b^(n/2)[n/2] Step 6: (S_(a), a_(M)) := a_(L) − a_(H)Step 7: (S_(b), b_(M)) := b_(H) − b_(L) /* Recursively multiply the halfsized numbers */ Step 8: low := KOA (a_(L), b_(L), n/2) Step 9: high :=KOA (a_(H), b_(H), n/2) Step 10: mid := KOA (a_(M), b_(M), n/2) /*Combine the subproducts to obtain the output */ Step 11: t := low +(low + high + S_(a)S_(b)mid) z^(n/2) + high z^(n) return t end

[0046] In Step 1, the size of n is considered. If it is one (i.e., ifthe inputs are one-word numbers) the inputs are multiplied and theresult returned. Otherwise, the function continues with the remainingsteps. In Steps 2 through 5, (n/2)-word numbers a_(L), b_(L), a_(H), andb_(H) are generated from the lower- and higher-order words of theinputs. In Steps 6 and 7, a_(M), b_(M), s_(a), and s_(b) are produced bythe subtraction operations described below:

s _(a) =sign(a _(L) −a _(H)) a _(M) =|a _(L) −a _(H)|

s _(b) =sign(b _(H) −b _(L)) b _(M) =|b _(H) −b _(L)|  (7)

[0047] The values of a_(M), b_(M), sign_(a), and sign_(b) are themagnitudes and signs of the subtractions, respectively, in Steps 6 and7. Like a_(L), b_(L), a_(H), and b_(H), a_(M) and b_(M) have n/2 words.In Steps 8, 9 and 10, these n/2-word numbers are multiplied by recursivecalls to the KOA function.

[0048] The values of low, high, and mid are calculated as follows:

low=a_(L)b_(L)

high=a_(H)bH

mid=|a_(L)−a_(H)|b_(H)−b_(L)|

[0049] In step 11, the product t=a * b is found using Equation (6). Inthis equation, low is substituted into a_(L)b_(L), high into a_(H)b_(H),and s_(a)s_(b)mid into (a_(L)−a_(H))(b_(H)−b_(L)). Note thats_(a)s_(b)mid=(s_(a)|a_(L)−a_(H)|)(s_(b)|b_(H)−b_(L)|)=(a_(L)−a_(H))(b_(H)−b_(L)).

[0050] The Karatsuba-Ofman Algorithm with Two's-Complement Arithmetic

[0051] In a typical computer-based implementation of the KOA, multi-wordadditions and subtractions are performed on a word-by-word basis usingtwo's-complement arithmetic. The exemplary KOA function described aboverepresents multi-word numbers in sign-magnitude form and does notdescribe the details of multi-word additions and subtractions. In thissection, the KOA function is implemented using two's-complementrepresentations for multi-word numbers.

[0052] The following pseudocode describes an exemplary implementation ofthe KOA function using two's-complement arithmetic. function: KOAcomp(a, b : n-word number; n : integer) t : 2n-word number a_(L), a_(M),a_(H) : (n/2)-word number low, mid, high : n-word number begin /* Whenthe input size is one word */ Step 1: if n = 1 then return t := a * b /*Generate 3 pairs of half sized numbers */ Step 2: a_(L) := a^(n/2)[0]Step 3: b_(L) := b^(n/2)[0] Step 4: a_(H) := a^(n/2)[n/2] Step 5: b_(H):= b^(n/2)[n/2] Step 6: (b_(a), a_(M)) := a_(L) − a_(H) Step 7: (b_(b),b_(M)) := b_(H) − b_(L) Step 8: if b_(a) = 1 then a_(M) := NEG(a_(M))Step 9: if b_(b) = 1 then b_(M) := NEG(b_(M)) /* Recursively multiplythe half sized numbers */ Step 10: t^(n)[0] := KOAcomp(a_(L), b_(L),n/2) Step 11: t^(n)[n] := KOAcomp(a_(H), b_(H), n/2) Step 12: mid :=KOAcomp (a_(M), b_(M), n/2) /* Combine the subproducts to obtain theoutput */ If b_(a) = b_(b) then Step 13: (c, mid) := t^(n)[0] +t^(n)[n] + mid else Step 14: (c, mid) := t^(n)[0] + t^(n)[n] − mid Step15: (c′, t^(n)[n/2]) := t^(n)[n/2] + mid Step 16: t^(n/2)[3n/2] :=t^(n/2)[3n/2] + c′ + c return t end

[0053] The functions KOA and KOAcomp first differ in Steps 6 and 7. Inthese steps, subtractions produce the results in two's-complement form.The subtraction a_(L)−a_(H) produces the (n/2)-word number a_(M) and the1-bit borrow b_(a). Similarly, the subtraction b_(H)−b_(L) produces the(n/2)-word number b_(M) and the 1-bit borrow b_(b). The NEG functionseen in Step 8 and 9 performs a negation operation, which is atwo's-complement operation. In these steps, b_(a) and b_(b) are checkedto determine the signs of (a_(L)−a_(H)) and (b_(H)−b_(L)). If b_(a)=1and b_(b)=1, the values of (a_(L)−a_(H)) and (b_(H)−b_(L)) are negativeand a_(M) and b_(M) are negated. In two's-complement form, the magnitudeof a number is itself if it is positive, or its negation if it isnegative. As a result, Steps 8 and 9 provide that a_(M)=|a_(L)−a_(H)|and b_(M)=|b_(L)−b_(H)|.

[0054] In Steps 10, 11, and 12, the products of a_(L)b_(L), a_(H)b_(H),and a_(M)b_(M)=|a_(L)−a_(H) |b_(L)−b_(H)| are found. In these steps, theproduct a_(M)b_(M) is stored into mid, while the products a_(L)b_(L) anda_(H)b_(H) are stored into respective lower and higher halves of theoutput array t (t^(n)[0] and t^(n)[n]). In contrast to the KOA function,the local variables low and high are not defined or used, resulting infewer memory resources being used.

[0055] In Steps 13 and 14, the suma_(L)b_(H)+a_(H)b_(L)=a_(L)b_(L)+a_(H)b_(H)+(a_(L)−a_(H))(b_(H)−b_(L))is found. The result is stored into the n-word variable mid and the1-bit carry c. In this computation, t^(n)[0] and t^(n)[n] (which containa_(L)b_(L) and a_(H)b_(H)) are added together. Moreover, if b_(a)=b_(b),mid=|a_(L)−a_(H)||b_(L)−b_(H)| is added to the sum. Otherwise, mid issubtracted from the sum. In essence, then, (a_(L)−a_(H))(b_(H)−b_(L)) isadded to the sum. Accordingly, (c,mid)=a_(L)b_(L)+a_(H)b_(H)+(a_(L)−a_(H))(b_(H)−b_(L)) andt=a_(L)b_(L)+a_(H)b_(H) z^(n).

[0056] In Steps 15 and 16, t is added to the term[a_(L)b_(L)+a_(H)b_(H)+(a_(L)−a_(H))(b_(H)−b_(L))]z^(n/2) so that t=a *b. To perform this operation, the subarray t^(n)[n/2] is added to mid inStep 15. This addition yields the carry-bit c′. Then, the carry bits cand c′ are propagated through the most significant n/2 words of t inStep 16.

[0057] Complexity of the KOA

[0058] In this section, the complexity of the exemplary KOAcomp functionis determined. In the complexity analysis that follows, the cost ofmanipulating the carry and borrow bit is ignored because it is small incomparison to the multi-word operations.

[0059] The following table gives the numbers of word-operations,word-reads, and word-writes needed when the input length is n>1. Thefirst, second, and third columns give the number of word-operations,memory reads, and memory writes, respectively. Steps 2 through 5 areignored because a_(L), a_(H), b_(L), and b_(H) are just copies of thelower and higher halves of the inputs. In practice, pointers to thelower and higher halves of the inputs are used instead of copies. TheKOAcomp function performs two n/2-word subtractions in Steps 6 and 7,two n-word additions in Step 13 (or one n-word addition and one n-wordsubtraction in Step 14), one n-word addition in Step 15, and onen/2-word addition with an input carry in Step 16. The function alsoperforms an n/2-word negation in Step 8 if TABLE 1 The complexity of arecursive call with the input length n > 1. Step No. Operation ReadWrite 6, 7  n 2n  n 8, 9 n/2 n/2 n/2 10, 11, 12 recursions 13, 14 2n 4n2n 15  n 2n  n 16 n/2 n/2 n/2 Totals 5n 9n 5n

[0060] b_(a)=1, and another one in Step 9 if b_(b)=1. Assuming thatb_(a) and b_(b) are equally probable to be one or zero, each recursivecall averages one n/2-word negation.

[0061] For each multi-word operation, the number of word-writes is equalto the number of word-operations, while the number of word-reads isequal to the number of word-operations multiplied by the number ofmulti-word operands.

[0062] In Steps 10, 11, and 12, there are three recursive calls withhalf-sized inputs. Consequently, the complexity T(n) can be found asfollows:

T(n)=3T(n/2)+μn  (9)

[0063] where μn is the total number operations (the reads and writesgiven in Table 1). Thus, μn=5n+9n+5n=19n. Using Equation (9), andassuming n=2^(k) for some integer k: $\begin{matrix}{{T(n)} = {{{3^{k}{T(1)}} + {\left\lbrack {\left( {3/2} \right)^{k - 1} + \ldots + \left( {3/2} \right)^{2} + {3/2} + 1} \right\rbrack \mu \quad n}}\quad = {{{3^{k}{T(1)}} + {\left\lbrack {1 + {2/3} + \left( {2/3} \right)^{2} + \ldots + \left( {2/3} \right)^{k - 1}} \right\rbrack \mu \quad {n\left( {3/2} \right)}^{k - 1}}}\quad = {{{3^{k}{T(1)}} + {{3\left\lbrack {1 - \left( {2/3} \right)^{k}} \right\rbrack}\mu \quad {n\left( {3/2} \right)}^{k - 1}}}\quad = {{3^{k}\left\lbrack {{T(1)} + {2\mu}} \right\rbrack} - {2\mu \quad n}}}}}} & (10)\end{matrix}$

[0064] where T(1) is the complexity of one-word multiplication.Moreover, 3^(k)=(2^(k))^(log) ^(₂) ³n^(log) ^(₂) ³≈n^(1.58). Thus,

T(n)=n ^(1.58) [T(1)+2μ]−2 μn.  (11)

[0065] Recursivity of KOA

[0066] Consider the multiplication of n=2^(k)>1 word numbers with theKOA, where k is some integer. Let r(n) be the number of recursionsneeded for this computation. The initial call makes three recursivecalls with n/2-word inputs. These three recursive calls each lead tor(n/2) recursions. Thus, the following recurrence may be calculated:$\begin{matrix}{{r(n)} = {{3 + {3{r\left( {n/2} \right)}}}\quad = {{3 + 9 + \ldots + 3^{k} + {3^{k}{r(1)}}}\quad = {{3 + 9 + \ldots + 3^{k}} = {3{\left( {3^{k} - 1} \right)/2.}}}}}} & (12)\end{matrix}$

[0067] Recursion Tree Analysis and Terminology

[0068] A recursion tree is a diagram that depicts the recursions in analgorithm. Recursion trees can be particularly helpful in the analysisof recursive algorithms like the KOA that may call themselves more thanonce in a recursion step.

[0069] The recursion tree of an algorithm can be thought of as ahierarchical tree structure wherein each branch represents a recursivecall of the algorithm. FIG. 1 shows an exemplary recursion tree 100 thatdepicts the multiplication of two exemplary operands using an algorithmsimilar to the KOA. In the example shown in FIG. 1, the hexadecimalnumbers “F3D1” and “6CA3” are multiplied. The initial call to thealgorithm is represented by the root 110 of the tree 100. The recursivecalls made by the initial call constitute the first level of recursion120 and are represented by the first-level branches 122, 124, 126emerging from the root 110. The recursive calls made by the branches122, 124, 126 constitute a second level of recursion 130 and arerepresented in the recursion tree 100 by the second-level branches 131through 139 emerging from the first-level branches 122, 124, 126. Thisprocess of recursion may continue until a final recursion level isreached, but in the illustrated example extends only two recursionlevels. A branch emerging from another branch may be called a “child.”Similarly, the branch from which the child stems may be called the“parent.” In FIG. 1, for instance, branch 131 is the child of branch122. In the recursion tree, if a branch represents a particularrecursive call, its children represents the recursive calls made by thatcall. In other words, a “caller-callee” relationship in an algorithmcorresponds to a “parent-child” relationship in the recursion tree. If arecursive call made at some recursion level makes no further recursivecalls, the branch representing it in the tree has no children, and maybe called a “leaf.” In the recursion tree depicted in FIG. 1, threerecursive calls are made by each of the three branches 122, 124, and126. Thus, three branches emerge from each preceding branch. The leaves131-139 represent multiplications of one-word inputs, which do not makeany recursive calls because they can be easily calculated usingclassical methods. Generally speaking, the size of the input parametersare reduced by half in each successive recursion level in the recursiontree. Thus, at some level, the branches will have one-word inputs andcease to make any further recursive calls.

[0070] Recursive tree terminology may be used to describe the KOA or asimilar divide-and-conquer algorithm. For example, if one recursive callinvokes another, the first recursive call may be referred to as theparent, and the latter recursive call as the child. Thus, a branch maybe used as a synonym for a recursive call, and a leaf as a synonym for arecursive call with one-word inputs. Additionally, a path is defined asa sequence of branches from the root in which each branch is a child ofthe previous one.

[0071] Consider the branch 122 in the KOA. This branch is a call to theKOA function described above. It has two inputs, “F3” and “6C”. Fromthese inputs, the branch 122 generates the half-sized pairs (a_(L),b_(L)), (a_(H), b_(H)), and (a_(M), b_(M)) (or (3,C), (F,6), and(−C,−6), respectively). Its children take these pairs as inputs,multiply them, and return the subproducts low, mid, and high in Steps 8through 10.

[0072] In the KOA, there are three choices for a branch. A branch eithertakes the input pair (a_(L), b_(L)) from its parent and returns thesubproduct low, takes the input pair (a_(H), b_(H)) and returns thesubproduct high, or takes the input pair (a_(M), b_(M)) and returns thesubproduct mid. For purposes of this disclosure, these first, second,and third types of branches are called low, high, and mid branchesrespectively. This classification of the branches is given in Table 2below. TABLE 2 The classification of the branches in the tree LOW BRANCHtakes the input pair (a_(L), b_(L)) from its parent returns thesubproduct low to its parent HIGH BRANCH takes the input pair (a_(H),b_(H)) from its parent returns the subproduct high to its parent MIDBRANCH takes the input pair (a_(M), b_(M)) from its parent returns thesubproduct mid to its parent

[0073] In each recursion level k, a special set of branches B_(k) alsoexists. The common property of the branches in this set is that theirancestors and themselves are not mid branches. The root satisfies thisproperty because it has no ancestor and is not a mid branch. On theother hand, a branch in a further recursion level satisfies thisproperty if itself and all its ancestors, except the root, are low andhigh branches. Thus, B_(k) for k≧0 can be defined as follows:

[0074] Definition 1: B₀ is the set whose only element is the root. B_(k)for k≧1 is a set of branches at the kth recursion level whose ancestors,except the root, are are all low and high branches.

[0075] The branches in the set B_(k) constitute “paths” of low and highbranches starting from the root. These paths are unique for each branchin the set B_(k). Using this fact, a unique element number for thebranches in the set B_(k) can be defined as follows:

[0076] Definition 2: The element number of the root in the set B₀ iszero. The element number of a branch in the set B_(k) for k≧1 is ak-digit binary number i=(i₁i₂ . . . i_(j) . . . i_(k))₂ where i_(j) isthe jth most significant digit. In this number, if the branch is a highbranch, i_(k) is 1. If the branch is a low branch, i_(k) is 0.Similarly, if the branch's ancestor in the jth recursion level is a highbranch, i_(j)<k is 1, and if the ancestor is a low branch, i_(j)<k is 0.

[0077] Definition 3: For purposes of this disclosure, B_(k,i) denotesthe branch in the set B_(k) with the element number i. Like any otherbranch, B_(k,i) computes the product of its inputs as an output. Theproduct computed by this branch is denoted by P_(k,i).

[0078] The following proposition gives the inputs of the branches in theset B_(k) for the case in which the input length of the root isspecially chosen:

[0079] Proposition 1: Let n be the input size of the root such that2^(k)|n for some integer k≧0. Then, the inputs of B_(k,i) are the m-wordnumbers a^(m)[im] and b^(m)[im] where m=N/2^(k). Also, the output ofB_(k,i) is the 2m word product P_(k,i)=a^(m)[im]b^(m)[im].

[0080] The proof for this proposition proceeds as follows. Consider abranch in B_(k) and its ancestors. The branches constitute a path of lowand high branches starting from the root. As more fully explained below,these branches each have inputs in the form of a^(length)[index],b^(length)[index] for some integers index and length. The low and highbranches have, respectively, the input pairs (a_(L), b_(L)) and (a_(H),b_(H)) generated by their parent. These are the lower- and higher-orderwords, or subarrays, of the parent's inputs. Then, if a path startingfrom the root always traverses low and high branches, the inputs of thebranches on this path will all be single subarrays of the root's inputsa and b. Thus, the inputs of these branches are a subarray of a and asubarray of b. Moreover, these subarrays have the same index and length.This is because the first and the second inputs are generated in Steps 2to 7 in the same way, except that the former is generated from the wordsof a and the latter is generated from the words of b.

[0081] The inputs in the form a^(length)[index], b^(length)[index] canbe identified by their index and length parameters. In the remainingpart of the proof, these parameters are investigated for the inputs ofthe branches in the special sets and the inputs of their ancestors. Let0≦j≦k. Then, 2^(j)|n and the inputs are evenly divided in half for eachrecursion level. Thus, the input length of a branch in the jth recursionlevel is exactly n/2^(j) words. Now, consider a branch in the set B_(k)with the element number i. As more fully explained above, its inputs andits ancestor's inputs can be given as a^(n/2) ^(j) [index_(j)], b^(n/2)^(j) [index_(j)] where 0≦j≦k and index_(j) is some appropriate integer.These expressions must yield the root's inputs for j=0 (i.e.,a^(n)[index₀]=a and b^(n)[index₀]=b). Then, index₀=0. The ancestor inthe jth recursion level is either a low or a high child of the one inthe (j−1)th recursion level. Thus, its inputs are either the lower orhigher halves of the inputs of the one in the (j−1)th recursion level.In the former case, index_(j)=index_(j−1). In the latter case,index_(j)=index_(j−1)+n/2^(j). Note thatindex_(j)=index_(j−1)+i_(j)n/2^(j) for both cases where i_(j) is the jthdigit of the element number i. Using this equation, the followingequality can be obtained: $\begin{matrix}{{index}_{k} = {{{index}_{0} + {\sum\limits_{j = 0}^{k}{i_{j}{n/2^{j}}}}}\quad = {{{index}_{0} + {\left( {\sum\limits_{j = 1}^{k}{i_{j}2^{k - j}}} \right){n/2^{k}}}}\quad = {{{index}_{0} + {\left( {i_{1}i_{2}\quad \cdots \quad i_{k}} \right)_{2}m}}\quad = {{{index}_{0} + {i\quad m}} = {im}}}}}} & (13)\end{matrix}$

[0082] It can be seen from this equation that a^(n/2) ^(j) [index_(j)]and b^(n/2) ^(j) [index_(j)] yield a^(m)[im] and b^(m)[m] for j=k. Thus,the inputs of a branch in the set B_(k) with the element number i area^(m)[im] and b^(m)[im]. And, the output of this branch is the productP_(k,i)=a^(m)[im]b^(m)[im], and the proof is complete.

[0083] The following proposition describes the children of the branchesin the special sets for which the input length of the root is a power oftwo.

[0084] Proposition 2: Let n be the input size of the root such that2^(k)|n for some integer k≧0. Consider the branch B_(k−1,i) wherem=n/2^(k). For this branch, the low child is B_(k,2i), the high child isB_(k,2i+1), and the mid child has the inputs |a^(m)[2im]−a^(m)[(2i+1)m]|and |b^(m)[(2i+1)m]−b^(m)[2im]|.

[0085] The proof for this proposition proceeds as follows. Assume thatk−1>0. Definiton 1 implies that if a branch is in the set B_(k−1), itslow and high children are in the set B_(k). According to Definition 2,the element number of such a branch is a (k−1)-digit number and those ofits children are k-digit numbers. Note that the children and the parentshare the same ancestry. It follows from Definition 2 that the mostsignificant k−1 digits are the same for the element numbers of a branchin B_(k−1) and its children in B_(k). Then, if the element number of thebranch is i, the element numbers of its children are 2i+i_(k) wherei_(k) is the least significant digit of these element numbers. Accordingto Definition 2, i_(k)=0 for the low child, and i_(k)=1 for the highchild. It follows that the element numbers of the low and high childrenare 2i and 2i+1, respectively, as stated in the proposition above. Sincethe element numbers of these children are known, their inputs can befound from Proposition 1. The inputs of the low child area_(L)=a^(m)[2im] and b_(L)=b^(m)[2im], while those of the high child area_(H)=a^(m)[2i+1)m] and b_(H)=b^(m)[(2i+1)m]. The inputs of the midchild were defined above as a_(M)=|a_(L)−a_(H)| and b_(M)=|b_(H)−b_(L)|.Substituting the values of a_(L), b_(L), a_(H), and b_(H) yieldsa_(M)=|a^(m)[2im]−a^(m)[(2i+1)m]| and b_(M)=|b^(m)[(2i+1)m]−b^(m)[2im]|.Thus, the inputs of the mid child can be written as in Proposition 2.

[0086] If k−1=0, B_(k−1) is B₀, the set whose only element is the root.The arguments in the proof are valid for this case as well, except thatthe element number of the root is not, and cannot, be a k−1=0 digitnumber. This condition, however, does not affect the above proposition.

[0087] Every branch in the KOA has two inputs and computes their productas an output. However, the branches in the KOA do not compute theproducts of their inputs by directly multiplying them unless the branchis a leaf of the recursion tree. Instead, the branches compute theproducts by appropriately shifting and adding the recursively determinedsubproducts computed by the branches' children. This computation isperformed in Step 11 of the KOA function described above. The equationin this step expresses the product computed by a branch in terms of thesubproducts computed by its children. Using the equation in Step 11, theproduct of a branch can be decomposed in terms of the special sets ofbranches described above. The following proposition illustrates thisdecomposition for the case in which the input length of the root isspecially chosen to be an integer power of two:

[0088] Proposition 3: Let n be the input size of the root such that2^(k)|n for some integer k≧0.Then, we can decompose P_(k−1,i) intosubproducts as follows

P _(k−1,i)=(1+z ^(m))(P _(k,2i) +z ^(m) P _(k,2i+1))+z ^(m) s _(a) a_(b) mid  (14)

[0089] where m=n/2^(k), mid=|a^(m)[2im]−a^(m)[(2i+1)m]||b^(m)[(2i+1)m]−b^(m)[2im]|, and s_(a)=sign(a^(m)[2im]−a^(m)[(2i+1)m])and s_(b)=sign(b^(m)[(2i+1)m]−b^(m)[2im]).

[0090] The proof of this proposition proceeds as follows. Consider thebranch B_(k−1,i). Because it is in the (k−1)th recursion level, thisbranch has inputs of (n/2^(k−1)=2m) words. According to Definition 3,the branch computes the product P_(k−1,i). As can be understood from theKOA function described above, the product P_(k−1,i) can be decomposedinto the following subproducts:

P _(k−1,i) =low+(low+high+s _(a) s _(b) mid) z ^(m) +high z ^(2m)  (15)

[0091] Note that the low and high children of B_(k−1,i) are B_(k,2i) andB_(k,2i+1) according to Proposition 2. Also, note that B_(k,2i) andB_(k,2i+1) compute the products P_(k,2i) and P_(k,2i+1). Thus, thesubstitutions low=P_(k,2i) and high=P_(k,2i+1) can be made in the aboveequation. After these substitutions and a little bit rearrangement,Equation (14) can be obtained.

[0092] As noted above, the low and high children are B_(k,2i) andB_(k,2i+)1. Thus, according to Proposition 1, the inputs of the lowchild are a_(L)=a^(m)[2im] and b_(L)=b^(m)[2im], while those of the highchild are a_(H)=a^(m)[(2i+1)m] and b_(H)=b^(m)[(2i+1)m]. Note thatmid=|a_(L)−a_(H)| |b_(H)−b_(L)|, s_(a)=sign(a_(L)−a_(H)), ands_(b)=sign(b_(H)−b_(L)) are defined above. Substituting the values ofa_(L), b_(L), a_(H), and b_(H), the same definitions for mid, s_(a), ands_(b) as in Proposition 14 can be obtained.

[0093] Alternate Methods of Multi-Precision Multiplication

[0094] In this section, new methods for multiplying multi-word numbersare described. Although the methods are described in terms of particularembodiments, the particular embodiments discussed are not limiting andmay vary in their implementation details.

[0095] In the KOA, a branch in some recursion level computes a productand benefits from the computations performed by its descendants in laterrecursion levels. This branch, however, is completely independent fromthe other branches in the same recursion level. FIG. 2 illustrates howthe KOA calculates independent paths during the multiplicationoperation. In particular, FIG. 2 shows how the KOA multiplies theoperands from FIG. 1. According to the KOA function described above, thealgorithm first determines the product of the low branch 126 at step 8.To perform this calculation, the KOA function recursively calls itselfto multiply “D1” and “A3” together. Because the low, mid, and highchildren from branch 126 are single-word products, the multiplication isperformed using classical methods at the leaves 137, 138, and 139. Theproducts “82,” “−54,” and “3,” respectively, are returned to branch 126and combined in Step 11 of the recursive KOA function to obtain theresult “8513”. Because this result is obtained before branches 122 and124 are calculated, this path is designated as Path 1 in FIG. 2.According to the KOA function described above, the high branch 122 isthe next branch to be calculated in Step 9, followed by the mid branch124 in Step 10. These subsequent recursive operations are labelled asPaths 2 and 3 in FIG. 2 and obtain their respective products in afashion similar to that of Path 1. The KOA function combines the resultfrom the branches 122, 124, 126 in Step 11, which provides thatt:=low+(low+high+s_(a)s_(b)mid) z^(n/2)+high z^(n), and returns thefinal product “67777A13.”

[0096] In some of the embodiments described below, computations that areperformed by branches on independent paths of the correspondingrecursion tree are combined. In other words, branches that do not have acommon parent are combined. In certain implementations, for instance,the branches that are combined are those in the special sets definedabove.

[0097]FIG. 3 is a flowchart of a general method 300 for multiplyingoperands. At process block 310, operands a and b are obtained and stored(e.g., in computer memory). In certain embodiments, the operands a and bare evaluated and, if necessary, stored as operands having n words,wherein n is an integer that is a power of two. For example, operands aand b may be manipulated such that they have n words. In one particularimplementation, for instance, one or both of the operands from processblock 310 are padded with zeros to have n words. In process block 312, afirst weighted sum is determined. The weighted sum is comprised ofmultiple subproducts that result from multiplying the words of operand awith the words of operand b. As more fully described below, thesubproducts comprise computations from at least two independent paths ofthe corresponding recursion tree. Thus, the general method in FIG. 3combines subproducts that are obtained in independent paths of the KOA.In process block 314, another weighted sum is calculated. This weightedsum is determined in part from a previously calculated weighted sum anda subproduct resulting from a recursive call to the general method 300.In the first iteration, the previously calculated weighted sum is thefirst weighted sum from process block 312. In subsequent iterations,however, the previously calculated weighted sum may be the weighted sumfrom the immediately previous iteration. In process block 316, adetermination is made as to whether any further iterations arenecessary. This determination may depend on the word size of theoriginal operands a and b. If further iterations are required, processblock 314 is repeated. If no further iterations are required, theweighted sum determined at process block 314 is the final product, andthis value is returned at process block 318. The value may, for example,be returned to a user, stored in memory, or used in further processing.

[0098]FIG. 4 is a flowchart showing an exemplary general method 400 ofperforming the operation in process block 312 of determining a firstweighted sum. At process block 410, multiple subproducts are found bymultiplying the individual words of operand a with corresponding wordsfrom operand b. As more fully described below, the corresponding wordsin operand b may have the same position relative the radix as the wordsin operand a. These subproducts may be obtained using classicalmultiplication methods because they involve multiplying individualwords. At process block 412, the multiple subproducts from process block412 are shifted into a predetermined alignment. For example, in oneparticular implementation, the subproducts are shifted by an amountrelated to a power of the radix of the words. For instance, if thehexadecimal words “3” and “C” from the input operands “F3D1” and “6CA3”are multiplied, their product may be left shifted by two words, or eightbits. This shift corresponds to multiplying the result by the square ofthe radix (i.e., 16² or z² for z=2²=2⁴). Once the subproducts have beenshifted by the appropriate amount, the subproducts are added together atprocess block 414. Because of the shifting of process block 412, the sumobtained at process block 414 is not an ordinary sum.

[0099]FIG. 5 is a flowchart showing an exemplary general method 500 ofperforming the operation in process block 314 of determining a nextweighted sum. At process block 510, a shift is performed on the previousweighted sum. The amount of the shift is related to the particulariteration being performed. For example, in one implementation, theprevious weighted sum is shifted by n/2^(k) where k=log₂ n and n is thenumber of words in the operands being multiplied in that particulariteration. Thus, for example, if the first iteration involvesmultiplying operands “F3D1” and “6CA3,” then k=log₂ 4=2, and theprevious weighted sum (i.e., the first weighted sum determined) isshifted by 4/2²=1. In process block 512, an intermediate product isdetermined by a recursive call to the general method 300 described inFIG. 3. In one particular implementation, the intermediate productdetermined is (a_(L)−a_(H))(b_(H)−b_(L)), which corresponds to a midbranch of the equivalent recursion tree. Because this is a recursivestep that may lead to further recursions, depending on the size of theoperands, the values of the previous weighted sum can be stored untilthe subsequent recursions return the value of the desired intermediateproduct. In process block 514, the next weighted sum is obtained byadding the previous weighted sum, the shifted version of the previousweighted sum obtained in process block 510, and the recursivelydetermined intermediate product obtained in process block 512.

[0100] One exemplary embodiment 600 of the general method 300 is shownin FIG. 6 and discussed in greater detail below. Generally speaking, theexemplary embodiment 600 uses weighted sums of the subproducts computedby the branches in each special set. These weighted sums are denoted bysumP_(k) for k≧0 and their weights are the powers of z=2^(w). Thedefinitions of SumP_(k) for k≧0 are described below.

[0101] Definition 4: Let n be the input size of the root such that2^(k)|n for some integer k≧0. The value of sumP_(k) is equal to thefollowing weighted sum of the products P_(k,i) for i=0, . . . , 2^(k)−1:$\begin{matrix}{{{sum}{\quad \quad}P_{k}} = {\sum\limits_{i = 0}^{2^{k} - 1}{P_{k,i}z^{i{({n/2^{k}})}}}}} & (16)\end{matrix}$

[0102] Note that i is a k-digit number. That is, 0≦i≦2^(k)−1. This isbecause i is the element number of B_(k,i).

[0103] It can be seen from Definition 16 that if the input size of theroot is divisible by 2^(k) (i.e., 2^(k)|n), sumP_(k), SumP_(k−1), . . ., sumP₁, sumP₀ are all defined. Among these weighted sums, sumP₀ is ofparticular interest because it equals the product computed by the rootP_(0,0): $\begin{matrix}{{{sum}{\quad \quad}P_{0}} = {{\sum\limits_{i = 0}^{2^{0} - 1}{P_{0,i}z^{i{({n/2^{0}})}}}} = P_{0,0}}} & (17)\end{matrix}$

[0104] The value of sumP₀ is the final result of the multiplication.

[0105] In the KOA, the P_(k,i) for i=0, . . . , 2^(k−1) are computed bythe branches in B_(k) individually. In the method of FIG. 6, however,the products are not individually computed, but are included in aweighted sum sumP_(k). In this way, computations performed by thebranches in B_(k) are combined.

[0106] In process block 610 of FIG. 6, operands a and b are obtained. Inthis embodiment, the input size of the root operands is limited to nwords, where n is an integer divisible by 2. The operands may be paddedwith zeroes to obtain the proper size. The recursion depth is given bylog₂ n, and sumP_(k) can be defined for all recursion levels k from 0 tolog₂ n.

[0107] In process block 612, a weighted sum sumP_(log2 n) is determinedin terms of the inputs of the root. The following proposition describessum_(log2 n):

[0108] Proposition 4: Let the root have the inputs a and b. Let the sizeof these inputs be n=2^(k) ^(₀) where k₀ is some integer. Then,sumP_(log) _(2 n) =sumP_(k) ₀ is the following weighted sum:$\begin{matrix}{{{sum}{\quad \quad}P_{\log_{2}n}} = {\sum\limits_{i = 0}^{n - 1}{{a\lbrack i\rbrack}*{b\lbrack i\rbrack}z^{i}}}} & (18)\end{matrix}$

[0109] The proof of this proposition proceeds from Definition 16, whichprovides:${{sum}\quad P_{k_{0}}} = {{\sum\limits_{i = 0}^{2^{k_{0}} - 1}{P_{k_{0,i}}z^{i{({n/2^{k_{0}}})}}}} = {\sum\limits_{i = 0}^{n - 1}\quad {P_{k_{0,i}}z^{i}}}}$

[0110] From Proposition 1, it can be shown that P_(k,i)=a^(m)[im] *b^(m)[im] where m=n/2^(k). For k=k₀, m=n/2^(k) ^(₀) =1. Thus, P_(k)_(0,i) =a[i] * b[i].

[0111] In process blocks 613-617, an iterative process is performed thatresults in the product of a and b. In the embodiment shown in FIG. 6,the number of iterations k performed is log₂n, which is set at processblock 613. At process block 614, a weighted sum sumP_(k−1) is determinedfrom sumP_(k) (i.e., the previously calculated weighted sum). At processblock 616, a determination is made whether k equals, and thus whethersumP₀ has been calculated. If not, then the value of k is decremented byone at process block 617 and process block 614 is repeated. If k is 1,then sumP₀ is assigned as the product of a and b. The relationshipbetween the iterations can be defined by the following proposition:

[0112] Proposition 5: Let n be the input size of the root such that2^(k)|n for some integer k≧0. Then, sumP_(k−1) is related to sumP_(k)according to the following equation: $\begin{matrix}{{{sum}\quad P_{k - 1}} = {{\left( {1 + z^{m}} \right){sum}\quad P_{k}} + {\sum\limits_{i = 0}^{2^{k - 1} - 1}\quad {{s_{a}(i)}{s_{b}(i)}\quad {{mid}(i)}\quad z^{{({{2i} + 1})}m}}}}} & (19)\end{matrix}$

[0113] where m=n/2^(k), mid(i)=|a^(m)[2im]−a^(m)[(2i+1)m]||b^(m)[(2i+1)m]−b^(m)[2im]|, ands_(a)(i)=sign(a^(m)[2im]−a^(m)[(2i+1)m]) ands_(b)(i)=sign(b^(m)[(2i+1)m]−b^(m)[2im])

[0114] The proof of Proposition 19 proceeds from Definition 16, fromwhich the following equation can be obtained: $\begin{matrix}{{{sum}\quad P_{k - 1}} = {{\sum\limits_{i = 0}^{2^{k - 1} - 1}{P_{{k - 1},i}z^{i{({n/2^{k - 1}})}}}} = {\sum\limits_{i = 0}^{2^{k - 1} - 1}{P_{{k - 1},i}z^{2\quad i\quad m}}}}} & (20)\end{matrix}$

[0115] Substituting the right hand side of the Equation (14) intoP_(k−1,i) in the above equation gives: $\begin{matrix}\begin{matrix}{{{sum}\quad P_{k - 1}} = {\sum\limits_{i = 0}^{2^{k - 1} - 1}{\left\lbrack {{\left( {1 + z^{m}} \right)\left( {P_{k,{2\quad i}} + {z^{m}P_{k,{{2i} + 1}}}} \right)} + {z^{m}s_{a}s_{b}m\quad i\quad d}} \right\rbrack z^{2\quad i\quad m}}}} \\{= {{\left( {1 + z^{m}} \right){\sum\limits_{i = 0}^{2^{k - 1} - 1}\left( {{z^{2\quad i\quad m}P_{k,{2\quad i}}} + {z^{{({{2i} + 1})}m}P_{k,{{2i} + 1}}}} \right)}} +}} \\{{\sum\limits_{i = 0}^{2^{k - 1} - 1}\quad {s_{a}s_{b}m\quad i\quad d\quad z^{{({{2\quad i} + 1})}m}}}} \\{= {{\left( {1 + z^{m}} \right){\sum\limits_{i = 0}^{2^{k} - 1}P_{k,i}}} + {\sum\limits_{i = 0}^{2^{k - 1} - 1}\quad {s_{a}s_{b}m\quad i\quad d\quad z^{{({{2\quad i} + 1})}m}}}}} \\{= {{\left( {1 + z^{m}} \right){sum}\quad P_{k}} + {\sum\limits_{i = 0}^{2^{k - 1} - 1}\quad {s_{a}s_{b}m\quad i\quad d\quad z^{{({{2\quad i} + 1})}m}}}}}\end{matrix} & (21)\end{matrix}$

[0116] Note that s_(a), s_(b), and mid in the proposition above are asgiven in Proposition 3. They are also functions of i, as described inthe Equation (19).

[0117] During the computations, sumP_(k) may need to be stored. The sizeof this multi-word number is given in the following proposition:

[0118] Proposition 6: Let n be the input size of the root such that2^(k)|n for some integer k≧0. Then, the multi-word number sumP_(k) is ofn+m words where m=n/2^(k).

[0119] The proof of Proposition 6 proceeds from Definition 16, whichshows that P_(k,i) is weighted by powers of z. The largest power of z is(2^(k)−1)n/2^(k)=n−m. Thus, sumP_(k) has at least n−m words. Each powerof z multiplies one of the products P_(k,i). Thus, the size of sumP_(k)is n−m plus the size of P_(k,i). The size of P_(k,i) is 2m words becauseit is the product of the m word numbers as shown in Proposition 1.Accordingly, sumP_(k) is n+m words.

[0120] At process block 618, sumP₀ (the product of the operands a and b)is returned.

[0121] The embodiment described in FIG. 6 may be implemented accordingto the following algorithm. Because the word size n of the operands inthe disclosed algorithm is a power of two, the algorithm is referred toas “KOA2^(k)” for convenience. The output t is the 2n-word product ofthe inputs. During the operation of the algorithm, sumP_(k) is stored inthe words of t from t[a] to t[2n−1]. Note that sumP_(k) is n+m words.Consequently, a=2n−(n+m)=n−m. The algorithm KOA2^(k) may be defined asfollows: function: KOA2^(k)(a, b : n-word number; n : integer) t :2n-word number α, m : integer a_(M) : m-word number /* max(m) = n/2 */mid : 2m-word number begin /* When the input size is one word */ Step 1:if n = 1 then return t := a * b /* Initialization */ Step 2: m := 1; α:= n − m /* Compute sumP_(log) ₂ ^(n) */ Step 3: (C, S) := a[0] * b[0]Step 4: t[α] := S for i = 1 to n − 1 Step 5: (C, S) := a[i] * b[i] + CStep 6: t[α + i] := S endfor Step 7: t[α + n] := C /* Compute sumP_(k)*/ for k =+00 log₂ ^(n) to 1 Step 8: t^(m)[α − m] := t^(m)[α] Step 9:t^(n)[α] := t^(n)[α] + t^(n)[α + m] Step 10: c := 0; b := 0 for i = 0 to2^(k−1) − 1 Step 11: (b_(a), a_(M)) := a^(m)[2im] − a^(m)[(2i + 1)m]Step 12: (b_(b), b_(M)) := b^(m)[(2i + 1)m] − b^(m)[2im] Step 13: ifb_(a) = 1 then a_(M) := NEG(a_(M)) Step 14: if b_(b) = 1 then b_(M) :=NEG(b_(M)) Step 15: mid := KOA (a_(M), b_(M), m) Step 16: if b_(a) =b_(b) then Step 17: (c, t^(2m)[α + 2im]) := t^(2m)[α + 2im] + mid + cStep 18: (b, t^(2m)[α + 2im]) := t^(2m)[α + 2im] − b Step 19: if b_(a) ≠b_(b) then Step 20: (b, t^(2m)[α + 2im]) := t^(2m)[α + 2im] − mid − bStep 21: (c, t^(2m)[α + 2im]) := t^(2m)[α + 2im] + c endfor Step 22: m:= 2m; a := n − m endfor return t end

[0122] In Step 1, n is evaluated. If n is one (i.e., if the inputs aresingle words), the inputs are directly multiplied and the resultreturned. Otherwise, the algorithm continues with the remaining steps.Step 2 initializes the variables m and a. In Steps 3 to 7, sumP_(log)^(₂) _(^(n)) , which equals$\sum\limits_{i = 0}^{n - 1}{{a\lbrack i\rbrack}*{b\lbrack i\rbrack}\quad z^{i}}$

[0123] according to Proposition 4, is computed. The result is stored inthe words t[a] through t[a+n] of the output array t, where a=n−m=n−1.The product a[i] * b[i] for i=0, . . . , n−1 yields a two-word result(C, S). C and S are the most and least significant words, respectively.Because this product is multiplied by z^(i), S is added to t[a+i] and Cto t[a+i+1].

[0124] In Steps 8 to 22, sumP_(k−1) is obtained from sumP_(k)iteratively. These steps are inside a loop running from k=log₂ n to k=1.Because m=n/2^(k), m is multiplied by 2 in Step 22 after each iteration.In this step, it is ensured that a=n−m. At the beginning of eachiteration, sumP_(k) is available in the words of t from t[a] throught[2n−1].

[0125] The value of sumP_(k−1) is computed in the manner shown inEquation (19). In Steps 8 and 9, (1+z^(m))sumP_(k) is computed and theresult stored into the words from t[a−m] through t[2n−1]. This result isadded to$\sum\limits_{i = 0}^{2^{k - 1} - 1}\quad {{s_{a}(i)}{s_{b}(i)}\quad {{mid}(i)}\quad z^{{({{2\quad i} + 1})}m}}$

[0126] to obtain sumP_(k−1.) The steps from 10 to 21 perform thisoperation using two's-complement arithmetic.

[0127] In steps 11 to 15, the 2m-word mid(i) is computed and stored intomid. The value of mid(i) is defined in Proposition 5. According to thisdefinition, two subtractions are performed in Steps 11 and 12. After thesubtractions, the m-word numbers a_(M) and b_(M) are obtained with theborrow bits b_(a) and b_(b). Note that b_(a)=sa(i) and b_(b)=s_(b)(i).Steps 13 and 14 ensure that a_(M) and b_(M) are equal to the magnitudesof the subtractions in Steps 11 and 12. Finally, a_(M) and b_(M) aremultiplied in a recursive call at Step 15 to obtain mid(i).

[0128] Recall that multi-word number (1+z^(m))sumP_(k) has been computedand stored into the words of t. In Steps 16 to 21, this number is addedwith s_(a)(i)s_(b)(i)mid(i) z^((2i+1)m). Because these steps are in a“for loop” counting from i=0 to 2^(k−1)−1,${s\quad u\quad m\quad P_{k - 1}} = {{\left( {1 + z^{m}} \right)s\quad u\quad m\quad P_{k}} + {\sum\limits_{i = 0}^{2^{k - 1} - 1}{{s_{a}(i)}{s_{b}(i)}\quad m\quad i\quad {d(i)}\quad z^{{({{2\quad i} + 1})}m}}}}$

[0129] In Step 16, the borrow bits b_(a) and b_(b) are evaluated. Ifb_(a)=b_(b), s_(a)(i)s_(b)(i)mid(i)=mid(i), the value of mid(i)z^((2i+1)m) is added to (1+z^(m))sumP_(k). This operation isaccomplished in Step 17. In this step, mid(i) is contained in mid, and(1+z^(m))sumP_(k) is contained in the words from t[a−m] to t[2n−1].Because mid(i) is 2m words and multiplied by z^((2i+1)m), mid is addedto the 2m-word subarray t^(2m)[a−m+(2i+1)m]=t^(2m)[a+2im].

[0130] If b_(a)≠b_(b), s_(a)(i)s_(b)(i)mid(i)=−mid(i), and the value ofmid is subtracted instead of added, as seen in Step 20. These additionsand subtractions yield the borrow bit b and the carry bit c. These bitsshould be propagated in both cases. In Step 10, these bits are set tozero.

[0131] After these iterations have been performed, sumP₀ is obtained inthe 0th to (2n−1)th words of t. As noted above, sumP₀ is the finalresult of the algorithm and constitutes the product of a * b.

[0132] Complexity of KOA2^(k)

[0133] In this section, the complexity of the exemplary KOA2^(k)algorithm described in the previous section is analyzed. In thecomplexity analysis, the cost of the carry and borrow bit manipulationsare ignored, as they were in the previous complexity analysis.

[0134] Table 3 gives the numbers of word operations, word reads and wordwrites needed for input lengths n>1. Specifically, the first, second,and third columns give the number of word operations, memory reads, andmemory writes, respectively. In Steps 3 to 7, the TABLE 3 The complexityof a call to KOA2^(k) with an input length n > 1. Step No operation readwrite 3, 4, 5, 6, 7 nT(1) + 2n − 2  8 n − 1 n − 1  9 nlog₂n 2nlog₂nnlog₂n 11 $\frac{n}{2}\log_{2}n$

nlog₂n $\frac{n}{2}\log_{2}n$

12 $\frac{n}{2}\log_{2}n$

nlog₂n $\frac{n}{2}\log_{2}n$

13, 14 $\frac{n}{2}\log_{2}n$

$\frac{n}{2}\log_{2}n$

$\frac{n}{2}\log_{2}n$

15 recursions 17, 20 nlog₂n 2nlog₂n nlog₂n 18, 21 nlog₂n nlog₂n nlog₂nTotal nT(1) + 16.5 nlog₂n + 4n − 4

[0135] words of a and b are read, multiplied, and stored into the wordsof t. The cost of these operations are included in T(1), and there are nmultiplications in these steps. Thus, the total cost is nT(1) due to themultiplications. Also, the addition in Step 5 must be taken intoaccount. This is a two-word addition operation in a loop iterating n−1times. Thus, it costs a total of (2n−2) word operations. It is assumedthat C and S in the steps from 3 to 7 are register variables. Thus, thecost of accessing them is not taken into account.

[0136] In Step 8, there exists a (m=n/2^(k))-word assignment in a loopiterating log₂ n times. This results in a total of${\sum\limits_{k = 1}^{\log_{2}n}\quad {n/2^{k}}} = {n - 1}$

[0137] word assignments. In Step 9, the addition of the n word numbersoccurs in the same loop. Thus, Step 9 costs a total of n log₂ n wordadditions.

[0138] Steps 11 to 21 are performed in two loops. The first loopiterates log₂ n times, and the second loop iterates 2^(k−1) times. Steps11 to 14 perform operations on m-word numbers. Thus, (m2^(k−1) log₂n)=(n/2) log₂ n word operations are needed to perform each of thesesteps. On the other hand, Steps 17 to 21 perform operations on 2m-wordnumbers. Thus, (2m2^(k−1) log₂ n)=n log₂ n word operations are needed toperform each of these steps.

[0139] According to the previous paragraph, Table 3 gives the number ofthe word operations as (n/2)₂ n for Steps 11 to 14. However, thesituation is different for Steps 13 and 14. The m-word negationoperations in these steps are conditionally executed, and it is assumedthat their execution probability is ½. Thus, their total complexityequals the complexity of one m-word negation on average, as shown inTable 3.

[0140] As seen in Table 3, a single value for the complexity of Steps 17and 20 is computed. Similarly, a single value is used for Steps 18 and21. This is because either Steps 17 and 18 or Steps 20 and 21 areexecuted, depending on the condition b_(a)=b_(b). In Steps 17 and 20,there exists two word reads and one memory write for each wordoperation. In Steps 18 and 21, there exists one word read and one memorywrite for each word operation. As noted above, performing each of theSteps 17 through 21 takes n log₂ n word operations.

[0141] The recursion occurs in Step 15. The recursive call in this stephas (m=n/2^(k))-word inputs and is in two “for” loops. As shown below,the complexity T(n) satisfies the recurrence: $\begin{matrix}{{T(n)} = {{\sum\limits_{k = 1}^{\log_{2}n}{2^{k - 1}{T\left( {n/2^{k}} \right)}}} + {{Total}(n)}}} & (22)\end{matrix}$

[0142] where Total(n) is the total number operations, reads, and writesgiven in Table 3 (i.e., nT(1)+16.5n log₂ n+4n−4). This recursionequation may be simplified as follows: $\begin{matrix}\begin{matrix}{{T\left( {n/2} \right)} = {{\sum\limits_{k = 1}^{\log_{2}{({n/2})}}{2^{k - 1}{T\left( {{n/2}/2^{k}} \right)}}} + {{Total}\left( {n/2} \right)}}} \\{= {{\left( {1/2} \right){\sum\limits_{k = 1}^{{\log_{2}n} - 1}{2^{k}{T\left( {n/2^{k + 1}} \right)}}}} + {{Total}\left( {n/2} \right)}}} \\{= {{\left( {1/2} \right){\sum\limits_{k = 2}^{\log_{2}n}{2^{k - 1}{T\left( {n/2^{k}} \right)}}}} + {{Total}\left( {n/2} \right)}}}\end{matrix} & (23)\end{matrix}$

[0143] Next, consider the following subtraction: $\begin{matrix}\begin{matrix}{{{T(n)} - {2{T\left( {n/2} \right)}}} = {{\sum\limits_{k = 1}^{\log_{2}n}{2^{k - 1}{T\left( {n/2^{k}} \right)}}} + {{Total}(n)} -}} \\{{{\sum\limits_{k = 2}^{\log_{2}n}{2^{k - 1}{T\left( {n/2^{k}} \right)}}} - {2{{Total}\left( {n/2} \right)}}}}\end{matrix} & (24)\end{matrix}$

[0144] After cancellations: $\begin{matrix}\begin{matrix}{{{T(n)} - {2{T\left( {n/2} \right)}}} = {{\sum\limits_{k = 1}^{1}{2^{k - 1}{T\left( {n/2^{k}} \right)}}} + {{Total}(n)} - {2{{Total}\left( {n/2} \right)}}}} \\{= {{T\left( {n/2} \right)} + {{Total}(n)} + {2{{Total}\left( {n/2} \right)}}}}\end{matrix} & (25)\end{matrix}$

[0145] Therefore, the following recurrence for the algorithm KOA2^(k)can be obtained: $\begin{matrix}\begin{matrix}{{T(n)} = {{3\quad {T\left( {n/2} \right)}} + {{Total}(n)} - {2{{Total}\left( {n/2} \right)}}}} \\{= {{3{T\left( {n/2} \right)}} + {16.5\quad n} + 4}}\end{matrix} & (26)\end{matrix}$

[0146] The recurrence relation above is similar to the recurrencerelation of the KOA. Recall that the recurrence relation for the KOAcompfunction described above is:

T(n)=3T(n/2)+19n

[0147] The solution for the recurrence relation is given in Equation(11). The complexity T(n) is O(n^(1.58)) in this solution. Equation (26)can be solved in the same fashion to find the complexity of theKOA2^(k), which is again O(n^(1.58)). However, because 16.5n+4<19n forn>1, the KOA2^(k) is less complex than the KOA.

[0148] Recursivity of KOA2^(k)

[0149] Let r(n) be the number of the recursive calls needed to multiplythe n-word numbers by the KOA2^(k). The KOA2^(k) makes 2^(k−1) recursivecalls with the (m=n/2^(k))-word inputs in a loop iterating from k=1 tolog₂ n. The following recurrence therefore results: $\begin{matrix}\begin{matrix}{{r(n)} = {{\sum\limits_{k = 1}^{\log_{2}n}2^{k - 1}} + {\sum\limits_{k = 1}^{\log_{2}n}{2^{k - 1}\quad r\quad \left( {n/2^{k}} \right)}}}} \\{= {n - 1 + {\sum\limits_{k = 2}^{\log_{2}n}{2^{k - 1}{r\left( {n/2^{k}} \right)}}}}}\end{matrix} & (27)\end{matrix}$

[0150] This recursion equation may be simplified as follows:$\begin{matrix}\begin{matrix}{{r\left( {n/2} \right)} = {{n/2} - 1 + {\sum\limits_{k = 1}^{\log_{2}{({n/2})}}{2^{k - 1}\quad r\quad \left( {{n/2}/2^{k}} \right)}}}} \\{= {{n/2} - 1 + {\left( {1/2} \right){\sum\limits_{k = 1}^{{\log_{2}n} - 1}{2^{k}{r\left( {n/2^{k + 1}} \right)}}}}}} \\{= {{n/2} - 1 + {\left( {1/2} \right){\sum\limits_{k = 2}^{\log_{2}n}{2^{k - 1}{r\left( {n/2^{k}} \right)}}}}}}\end{matrix} & (28)\end{matrix}$

[0151] Next, consider the following subtraction: $\begin{matrix}{{{r(n)} - {2{r\left( {n/2} \right)}}} = {n - 1 + {\sum\limits_{k = 1}^{\log_{2}n}{2^{k - 1}{r\left( {n/2^{k}} \right)}}} - \left( {n - 2 + {\sum\limits_{k = 2}^{\log_{2}n}{2^{k - 1}{r\left( {n/2^{k}} \right)}}}} \right)}} & (29)\end{matrix}$

[0152] After cancellations: $\begin{matrix}\begin{matrix}{{{r(n)} - {2{r\left( {n/2} \right)}}} = {1 + {\sum\limits_{k = 1}^{1}\quad {2^{k - 1}{r\left( {n/2^{k}} \right)}}}}} \\{= {1 + {3\quad {r\left( {n/2} \right)}}}}\end{matrix} & (30)\end{matrix}$

[0153] The following recurrence is eventually obtained: $\begin{matrix}\begin{matrix}{{r(n)} = {1 + {3\quad {r\left( {n/2} \right)}}}} \\{= {1 + 3 + \ldots + 3^{k - 1} + {3^{k}{r(1)}}}} \\{= {{1 + 3 + \ldots + 3^{k - 1}} = {\left( {3^{k} - 1} \right)/2}}}\end{matrix} & (31)\end{matrix}$

[0154] In Equation (12), it was found that the recursivity of the KOAwas 3(3^(k)−1)/2. Therefore, the KOA2^(k) algorithm is three times lessrecursive than the KOA.

[0155] An Example of Multiplication Using KOA2^(k)

[0156] The operation of the algorithm KOA2^(k) is illustrated in thefollowing example and FIGS. 7 through 11. FIGS. 7 through 11 illustratethe operation of the KOA2^(k) algorithm by relating it to the recursiontree of FIG. 1. In this example, two numbers “F3D1” and “6CA3” aremultiplied together. Both numbers comprise four hexadecimal values,which can be associated with four 4-bit words. Thus, the operand size isn=4, and the word size is w=4. Note that the operand size is a power oftwo. Let a denote F3D1 and a[i] denote the ith digit of F3D1. Also, letb denote 6CA3 and b[i] denote the ith digit of 6CA3.

[0157] The First Weighted Sum

[0158] In Steps 3 to 7, sumP_(log) _(2 n) =sumP₂ is computed. As statedin Proposition 4, sumP_(log) ₂ _(^(n)) is equal to$\sum\limits_{i = 0}^{n - 1}{{a\lbrack i\rbrack}*{b\lbrack i\rbrack}\quad {z^{i}.}}$

[0159] Thus,

a[0] * b[0]=1 * 3=03 a[1] * b[1]=D *A=82

a[2] * b[2]=3 * C=24 a[3] * b[3]=F *6=5A

[0160] Multiplication by z=2^(w) is equivalent to a 1-word shift. Thus,sumP₂ is found as follows: $\frac{\begin{matrix}\begin{matrix}\begin{matrix}{\quad 03} \\{\quad 82}\end{matrix} \\{\quad 24}\end{matrix} \\{\quad {{+ \quad 5}A}\quad}\end{matrix}}{\quad {{s\quad u\quad m\quad P_{2}} = {5C\quad {C23}}}}$

[0161]FIG. 7 illustrates these steps in terms of the correspondingrecursion tree. As seen in FIG. 7, the four products that are added inthe weighted sum sumP₂ correspond to branches 131, 133, 137, 139.Branches 131, 133, 137, 139 are leaves of the recursion tree and belongto the special set described above consisting of low or high brancheswith no mid-branch ancestors. Moreover, the amount of shifting performedon each subproduct before obtaining the weighted sum is related to thepositions of the multiplied words relative to the radix.

[0162] The Iterative Steps

[0163] Steps 8 to 22 are inside a “for loop”. This loop implements theiteration in Equation (19). In the first iteration of the loop, sumP₁ iscomputed from sumP₂. In the second iteration, sumP₀ is computed fromsumP₁.

[0164] Steps 8 and 9 (1st Iteration)

[0165] In Steps 8 and 9, the term (1+z^(m))sumP_(k) in Equation (19) iscomputed. In the first iteration, k=log₂ _(^(n)) =2 and m=n/2^(k)=1. Theterm (1+z)sumP₂ is obtained by shifting and adding sumP₂ with itself.$\frac{\begin{matrix}{5\quad C\quad C\quad 2\quad 3} \\{{+ \quad 5}\quad C\quad C\quad 2\quad 3\quad 0}\end{matrix}}{\quad {6\quad 2\quad 8\quad E\quad 5\quad 3}}$

[0166] Steps 11 to 15 (1st Iteration)

[0167] In Steps 11 to 15, the terms s_(a)(i)s_(b)(i)mid(i) z^((2i+1)m)in Equation (19) for i=0, . . . , 2^(k−1)−1 are computed.$\begin{matrix}{{{s_{a}(0)}{s_{b}(0)}{mid}\quad (0)\quad z^{m}} = {\left( {{a\lbrack 0\rbrack} - {a\lbrack 1\rbrack}} \right)\left( {{b\lbrack 1\rbrack} - {b\lbrack 0\rbrack}} \right)z^{m}}} \\{= {{\left( {1 - D} \right)\left( {A - 3} \right)z^{m}} = {{{- 54}\quad z^{m}} = {- 540}}}} \\{{{s_{a}(1)}{s_{b}(1)}{mid}\quad (1)\quad z^{3m}} = {\left( {{a\lbrack 0\rbrack} - {a\lbrack 1\rbrack}} \right)\left( {{b\lbrack 1\rbrack} - {b\lbrack 0\rbrack}} \right)z^{3m}}} \\{= {{\left( {3 - F} \right)\left( {6 - C} \right)z^{3m}} = {{48\quad z^{3m}} = 48000}}}\end{matrix}$

[0168] Steps 16 to 21 (1st Iteration)

[0169] Once every term in iteration relation (19) has been computed, theweighted sum sumP₁ can be obtained by adding the terms as follows:$\frac{\quad \begin{matrix}{6\quad 2\quad 8\quad E\quad 5\quad 3} \\{\quad {{- 5}\quad 4\quad 0}} \\{{+ 4}\quad 8\quad 0\quad 0\quad 0}\end{matrix}}{{{sum}\quad P_{1}} = {6\quad 7\quad 0\quad 9\quad 1\quad 3}}$

[0170]FIG. 8 illustrates these steps in terms of the equivalentrecursion tree. As seen in FIG. 8, the two additional subproducts thatare added in the weighted sum sumP₁ correspond to branches 132, 138.Moreover, branches 132, 138 correspond to the mid-branch children fromthe high branch 122 and the low branch 126. The amount of shiftingperformed on each subproduct 132, 138 is related to the position of thesubproducts within the recursion tree.

[0171] Steps 8 and 9 (2nd Iteration)

[0172] In the second iteration, k=log₂ _(^(n)) =1 and m=n/2^(k)=2. Thus,(1+z^(m))sumP_(k)=(1+z²)sumP₁ is computed. For this, sumP₁ is shiftedand added to itself as shown below. $\frac{\begin{matrix}{\quad {6\quad 7\quad 0\quad 9\quad 1\quad 3}} \\{{+ \quad 6}\quad 7\quad 0\quad 9\quad 1\quad 3\quad 0\quad 0}\end{matrix}}{\quad {6\quad 7\quad 7\quad 0\quad 1\quad C\quad 1\quad 3}}$

[0173] Steps 11 to 15 (2nd Iteration)

[0174] For k=1 and m=2, s_(a)(i)s_(b)(i)mid(i) z^((2i+1)m) is computedfor i=0, . . . , 2^(k−1)−1. $\begin{matrix}{{{s_{a}(0)}{s_{b}(0)}{mid}\quad (0)\quad z^{m}} = {\left( {{a^{2}\lbrack 0\rbrack} - {a^{2}\lbrack 2\rbrack}} \right)\left( {{b^{2}\lbrack 2\rbrack} - {b^{2}\lbrack 0\rbrack}} \right)z^{2m}}} \\{= {\left( {{D1} - {F3}} \right)\left( {{6C} - {A3}} \right)\quad {z^{2m}\left( {- 22} \right)}\left( {- 33} \right)\quad z^{2m}}} \\{= {{74\quad E\quad z^{2m}} = {74\quad {E00}}}}\end{matrix}$

[0175] To compute the product of (−22)(−37), Step 15 includes arecursive call to the KOA2^(k) algorithm. The details of this recursivecall are omitted from the above equation, but are shown in FIG. 9. Inparticular, FIG. 9 shows that s_(a)(0)s_(b)(0)mid(0)z^(m) corresponds tobranch 124 from the equivalent recursion tree in FIG. 1. Thus, todetermine the product of (−22)(−37), the KOA2^(k) is performed with a=22and b=37, where s_(a)(0) and s_(b)(0) ensure the proper sign of theresult. In process block 700, the weighted sum sumP₁ is calculated usingthe products from branches 134, 136. Then, in order to calculate thenext weighted sum sumP₀, the value of mid(0) is determined at processblock 702. As seen from process block 702, the value of mid(0) in thiscase is zero. At process block 704, the weighted sum sumP₀ is determinedaccording to Equation (19). The result of the weighted sum is “74E,”which is returned for use in Steps 16 to 21 of the earlier iteration.

[0176] Steps 16 to 21 (2nd Iteration)

[0177] At this point, every term in the iteration relation (19) has beencomputed for k=1 and m=2. Adding these terms, sumP₀ is obtained asfollows: $\frac{\begin{matrix}{\quad {6\quad 7\quad 7\quad 0\quad 1\quad C\quad 1\quad 3}} \\{{+ \quad 7}\quad 4\quad E\quad 0\quad 0}\end{matrix}}{{{sumP}_{0} = {6\quad 7\quad 7\quad 7\quad 6\quad A\quad 1\quad 3}}\quad}$

[0178] sumP₀ is the result of the multiplication.

[0179]FIG. 10 illustrates the computation of the weighted sum sumP₀ andthe result “67776A13.” Note that the final result is the same as the oneshown in FIG. 2 illustrating the KOA.

[0180] Applications of KOA2^(k)

[0181] The methods described above may be used in a variety of differentapplications wherein multiplication of multi-precision numbers isperformed. For example, the methods may be used in a software programthat performs arbitrary-precision arithmetic (e.g., Mathematica) or inother specialized or general-purpose software implementations.Additionally, the methods may be used in the field of cryptography,which often involves the manipulation of large multi-precision numbers.For example, the methods may be used to at least partially perform thecalculation of a variety of different cryptographic parameters. Thesecryptographic parameters may include, for instance, a public key, aprivate key, a ciphertext, a plaintext, a digital signature, or acombination of these parameters. Cryptographic systems that may benefitfrom the disclosed methods and apparatus include, but are not limitedto, systems using the RSA algorithm, the Diffie-Hellman key exchangealgorithm, the Digital Signature Standard (DSS), elliptic curves, theElliptic Curve Digital Signature Algorithm (ECDSA), or other algorithms.In one particular implementation, the methods are used, at least inpart, to generate and verify a key pair or to generate and verify asignature according to the ECDSA. For example, the methods may be usedto compute Q=dG during the key-pair generation process, wherein Q is apublic key, d is a private key, and G is a base point. Moreover, themethods may be used to verify that nQ=O during the key pair verificationprocess, wherein n is the order of the point G, and O is the point atinfinity of the elliptic curve. Similarly, the methods may be used tocompute kG=(x₁, y₁), wherein k is a random or pseudorandom integer and(x₁, y₁) are points on an elliptic curve. The methods may similarly beused to calculate the related modular, inverse modular, and hashfunctions during the signature generation and verification processes.

[0182] Any of the methods described above may be implemented in a numberof different hardware and/or software environments. FIG. 11 shows ablock diagram of one exemplary general hardware implementation. Moreparticularly, FIG. 11 shows a multiplying apparatus 800 (e.g., acomputer) that includes a processor 810 (e.g., a microprocessor), memory812 (e.g., RAM or ROM) and an input data path 814. The multiplicationalgorithm may be stored in the memory or on a computer-readable medium(e.g., hard disk, CD-ROM, DVD, floppy disk, RAM, ROM) that is separatefrom the memory 812 and that is accessed by the processor 810 before orduring execution of the algorithm. During operation, the input operandsmay be supplied via the input data path 814 or by the memory 812. Theprocessor 810 and the memory 812 are coupled together via the data paths816, which enable the various read and write operations performed duringthe algorithm. The final product computed by the processor 810 may beoutput from the processor on output data path 816 or stored in thememory 812 for later use. The details of this general hardwareimplementation are omitted.

[0183] The disclosed methods may also be implemented in dedicateddigital circuits configured to perform multi-precision multiplication.For instance, FIG. 12 shows a circuit 820 that includes a multiplyingcircuit 830 (e.g., combinational logic and sequential memory elements)configured to perform the multi-precision multiplication. Two inputs832, 833 may be used to input the operands a and b. Alternatively, theoperands may be input sequentially via a single input, or in parallelvia multiple input paths. The circuit 820 may be clocked to load theoperands and to perform the multiplication operation. The result of themultiplication may be output on data path 834. The circuit 820 may be,for instance, a printed circuit board (PCB), a smart card, a fieldprogrammable gate array (FPGA), a field programmable system levelintegrated circuit (FPSLIC), an integrated circuit used in a System onChip environment (SOC), or any other type of integrated circuit suitedfor implementing the algorithms described above.

[0184] As noted, the disclosed methods may be used in cryptography tohelp compute a variety of cryptographic parameters using multi-precisionmultiplication. FIG. 13 shows a block diagram of general cryptographicapparatus 840 that may be used to multiply two operands to produce acryptographic parameter. The apparatus 840 includes a cryptographicprocessor 850 used to perform the algorithm; memory 852 used to storethe operands, the intermediate results, and computer-executableinstructions for performing the algorithm; and an input data path 854.The apparatus 840 operates much like the apparatus described in FIG. 11,but produces a cryptographic parameter at its output 856. Thecryptographic parameter may be related to or constitute a portion of apublic key, private key, ciphertext, plaintext, digital signature, orsome combination thereof. The parameter may also constitute a number ofother values used in cryptography. The cryptographic apparatus 840 maybe included in a variety of security applications. For instance, theapparatus 840 may be included in a secure transaction server used forfinancial transactions, confidential record storage, SmartCards, andcell phones. The dedicated circuit shown in FIG. 12 may similarly beimplemented as part of a dedicated cryptographic system.

[0185] In view of the many possible implementations, it will berecognized that the illustrated embodiments include only examples andshould not be taken as a limitation on the scope of the disclosedtechnology. Instead, the invention is intended to encompass allalternatives, modifications, and equivalents as may be included withinthe spirit and scope of the technology defined by the following claims.

What is claimed is:
 1. A method of multiplying a first operand and asecond operand, comprising: storing the first operand as a first arrayof n words, and storing the second operand as a second array of n words,wherein n is an integer that is a power of two; determining a firstweighted sum from multiple subproducts of corresponding words of thefirst operand and the second operand; and iteratively determining a nextweighted sum from a previous weighted sum and a recursively calculatedintermediate product.
 2. The method of claim 1, wherein the firstweighted sum is a previous weighted sum.
 3. The method of claim 1,wherein the corresponding words of the first operand and the secondoperand are associated with a selected power of a radix.
 4. The methodof claim 1, wherein determining the first weighted sum includesword-shifting at least one of the multiple subproducts.
 5. The method ofclaim 1, wherein the multiple subproducts correspond to low or highbranches having no mid-branch ancestors of a corresponding recursiontree.
 6. The method of claim 1, wherein a number of iterations is log₂n.
 7. The method of claim 1, wherein the next weighted sum furtherincludes a shifted version of the previous weighted sum.
 8. The methodof claim 1, wherein the storing comprises padding at least one of theoperands to form n words.
 9. A method of generating a cryptographic keypair, comprising: receiving a private key; receiving a base point, theprivate key and the base point comprising multi-precision numbers havingn words, where n is an integer that is a power of two; and multiplyingthe private key by the base point to obtain a multi-precision numberassociated with a public key, the multiplying being performed accordingto the method of claim
 1. 10. A digital signature process, comprising:receiving a first cryptographic parameter and a second cryptographicparameter, each cryptographic parameter being a multi-precision numberhaving n words, where n is an integer that is a power of two; andmultiplying the first cryptographic parameter by the secondcryptographic parameter using the method of claim
 1. 11. The method ofclaim 10, wherein the digital signature process is signature generationor signature verification associated with an elliptic curve digitalsignature.
 12. A field programmable gate array configured to perform themethod of claim
 1. 13. An integrated circuit having combinational logicand memory elements configured to perform the method of claim
 1. 14. Acomputer-readable medium, comprising instructions for performing themethod of claim
 1. 15. A method of multiplying a first operand and asecond operand, comprising: storing the first operand and the secondoperand as n words, wherein n is an integer that is a power of two;determining multiple subproducts by multiplying words of the firstoperand with corresponding words of the second operand; shifting themultiple subproducts by an amount related to the respective positions ofthe multiplied words within the respective operands; and adding theshifted subproducts to obtain a weighted sum.
 16. The method of claim15, further comprising performing at least one iterative computation inwhich the weighted sum is an addend.
 17. The method of claim 16, whereina number of iterative computations performed is log₂n
 18. The method ofclaim 15, wherein the weighted sum is a first weighted sum, the methodfurther comprising calculating a second weighted sum in which the firstweighted sum is an addend.
 19. A method of generating a cryptographickey pair, comprising: receiving a private key; receiving a base point,the private key and the base point comprising multi-precision numbershaving n words, where n is an integer that is a power of two; andmultiplying the private key by the base point to obtain amulti-precision number associated with a public key, the multiplyingbeing performed according to the method of claim
 15. 20. A signaturegeneration or signature verification process, comprising: receiving afirst cryptographic parameter and a second cryptographic parameter, eachcryptographic parameter being a multi-precision number having n words,where n is an integer that is a power of two; and multiplying the firstcryptographic parameter by the second cryptographic parameter using themethod of claim
 15. 21. The method of claim 20, wherein the signaturegeneration or signature verification is associated with an ellipticcurve digital signature.
 22. A field programmable gate array configuredto perform the method of claim
 15. 23. An integrated circuit havingcombinational logic and memory elements configured to perform the methodof claim
 15. 24. A computer-readable medium, comprising instructions forperforming the method of claim
 15. 25. A method of multiplying,comprising: obtaining a first operand a and a second operand b; storingthe operands a and b as n words, n being an integer that is a power oftwo, wherein the operands a and b can be written in radix z as${a = {{\sum\limits_{i = 0}^{n - 1}{{a\lbrack i\rbrack}z^{i}\quad {and}\quad b}} = {\sum\limits_{i = 0}^{n - 1}{{b\lbrack i\rbrack}z^{i}}}}},$

where i is an index integer and a[i] indicates the ith word in theoperand a and b[i] indicates the ith word in the operand b computingsumP_(log) ₂ _(n), wherein sumP_(log) _(2 n) is equal to${\sum\limits_{i = 0}^{n - 1}{{a\lbrack i\rbrack}*{b\lbrack i\rbrack}z^{i}}};$

storing sumP_(log) ₂ ^(n) iteratively computing sumP_(k−1) from sumP_(k)for k=log₂ n to k=1, wherein:${{sumP}_{k - 1} = {{\left( {1 + z^{m}} \right)\quad {sumP}_{k}} + {\sum\limits_{i = 0}^{2^{k - 1} - 1}{{s_{a}(i)}{s_{b}(i)}\quad {mid}\quad (i)\quad z^{{({{2i} + 1})}m}}}}},$

where m=n/2^(k), and mid(i)=|a ^(m)[2im]−a ^(m)[(2i+1)m]| |b^(m)[(2i+1)m]−b ^(m)[2im]|, s _(a)(i)=sign(a ^(m)[2im]−a ^(m)[(2i+1)m]),and s _(b)(i)=sign(b ^(m)[(2i+1)m]−b ^(m)[2im]); and after eachiteration, storing sumP_(k−1).
 26. The method of claim 25, furthercomprising returning sumP₀.
 27. The method of claim 25, wherein a numberof iterations is log₂n.
 28. The method of claim 25, further comprisingcomputing a[0] * b[0] if n=1.
 29. A method of generating a cryptographickey pair, comprising: receiving a private key; receiving a base point,the private key and the base point comprising multi-precision numbershaving n words, where n is an integer that is a power of two; andmultiplying the private key by the base point to obtain amulti-precision number associated with a public key, the multiplyingbeing performed according to the method of claim
 25. 30. A signaturegeneration or signature verification process, comprising: receiving afirst cryptographic parameter and a second cryptographic parameter, eachcryptographic parameter being a multi-precision number having n words,where n is an integer that is a power of two; and multiplying the firstcryptographic parameter by the second cryptographic parameter using themethod of claim
 25. 31. The method of claim 30, wherein the signaturegeneration or signature verification is associated with an ellipticcurve digital signature.
 32. A field programmable gate array configuredto perform the method of claim
 25. 33. An integrated circuit havingcombinational logic and memory elements configured to perform the methodof claim
 25. 34. A computer-readable medium, comprising instructions forperforming the method of claim
 25. 35. A method of multiplying a firstoperand and a second operand, comprising: storing the first operand as afirst array of n words, and storing the second operand as a second arrayof n words, wherein n is an integer that is a power of two; determiningmultiple subproducts by multiplying words of the first operand withcorresponding words of the second operand, at least two of thesubproducts corresponding to independent branches of a correspondingrecursion tree; and combining the subproducts into a weighted sum. 36.The method of claim 35, wherein the corresponding words of the secondoperand have a same position relative to a radix as the words of thefirst operand.
 37. The method of claim 35, wherein the subproductsincluded in the weighted sum are multiplied by a power of a relatedradix.
 38. The method of claim 35, wherein the independent branches arelow or high branches that have no mid-branch ancestors from theequivalent recursion tree.
 39. The method of claim 35, wherein theweighted sum further includes a recursively calculated mid branch fromthe equivalent recursion tree.
 40. A field programmable gate arrayconfigured to perform the method of claim
 35. 41. An integrated circuithaving combinational logic and memory elements configured to perform themethod of claim
 35. 42. A computer-readable medium, comprisinginstructions for performing the method of claim
 35. 43. A cryptographicmethod, comprising: receiving a first operand and a second operand;storing the first operand and the second operand as a first and a secondarray, respectively, of n words, wherein n is an integer that is a powerof two; determining a first weighted sum from multiple subproducts ofcorresponding words of the first operand and the second operand;iteratively determining a next weighted sum from a previous weighted sumand a recursively calculated intermediate product; and outputting acryptographic parameter.
 44. The method of claim 43, wherein thecryptographic parameter is the value of the next weighted sum after thepredetermined number of iterations.
 45. The method of claim 43, whereinthe first weighted sum is the previous weighted sum.
 46. The method ofclaim 43, wherein the corresponding words of the first operand and thesecond operand are associated with a selected power of a radix.
 47. Themethod of claim 43, wherein determining a first weighted sum includesword-shifting at least one of the multiple subproducts.
 48. The methodof claim 43, wherein the multiple subproducts correspond to low or highbranches having no mid-branch ancestors of an equivalent recursion tree.49. The method of claim 43, wherein the predetermined number ofiterations is log₂ n.
 50. The method of claim 43, wherein the nextweighted sum further includes a shifted version of the previous weightedsum.
 51. The method of claim 43, wherein the recursively calculatedintermediate product corresponds to a mid branch of a correspondingrecursion tree.
 52. The method of claim 43, wherein at least one of theoperands corresponds to a private key, and the cryptographic parameteris a public key.
 53. The method of claim 43, wherein the cryptographicparameter is used in digital signature generation or digital signatureverification.
 54. The method of claim 53, wherein the digital signaturegeneration process or digital signature verification process is part ofan elliptic curve digital signature algorithm.
 55. An apparatus formultiplying a first and a second operand having n words, wherein n is aninteger that is a power of two, comprising: means for obtaining thefirst operand and the second operand; means for determining a firstweighted sum from multiple subproducts, the multiple subproducts beingfound by multiplying words of the first operand with corresponding wordsof the second operand; and means for iteratively determining a nextweighted sum from a previous weighted sum and a recursively calculatedintermediate product.